UNGRADE: UNsupervised GRAph DEcomposition

نویسندگان

  • Bruno Golénia
  • Sebastian Spiegler
  • Peter A. Flach
چکیده

This article presents an unsupervised algorithm for word decomposition called UNGRADE (UNsupervised GRAph DEcomposition) to segment any word list of any language. UNGRADE assumes that each word follows the structure prefixes, a stem and suffixes without giving a limit on the number of prefixes and suffixes. The UNGRADE’s algorithm works in three steps and is language independent. Firstly, a pseudo stem is found for each word using a window based on Minimum Description Length. Secondly, prefix sequences and suffix sequences are found independently using a graph algorithm called graphbased unsupervised sequence segmentation. Finally, the morphemes from previous steps are joined to provide a segmented word list. We focus purely on the segmentation of words, thus, we employ a trivial method for labeling each morpheme which is the segment of the morpheme itself. UNGRADE is applied to 5 languages (English, German, Finnish, Turkish and Arabic) and results are provided according to their gold standard.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Morpheme Discovery with Ungrade

In this paper, we present an unsupervised algorithm for morpheme discovery called UNGRADE (UNsupervised GRAph DEcomposition). UNGRADE works in three steps and can be applied to languages whose words have the structure prefixes-stem-suffixes. In the first step, a stem is obtained for each word using a sliding window, such that the description length of the window is minimised. In the next step p...

متن کامل

Distinct edge geodetic decomposition in graphs

Let G=(V,E) be a simple connected graph of order p and size q. A decomposition of a graph G is a collection π of edge-disjoint subgraphs G_1,G_2,…,G_n of G such that every edge of G belongs to exactly one G_i,(1≤i ≤n). The decomposition 〖π={G〗_1,G_2,…,G_n} of a connected graph G is said to be a distinct edge geodetic decomposition if g_1 (G_i )≠g_1 (G_j ),(1≤i≠j≤n). The maximum cardinality of π...

متن کامل

Some Advances in Role Discovery in Graphs

Role discovery in graphs is an emerging area that allows analysis of complex graphs in an intuitive way. In contrast to other graph problems such as community discovery, which finds groups of highly connected nodes, the role discovery problem finds groups of nodes that share similar graph topological structure. However, existing work so far has two severe limitations that prevent its use in som...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

Mixed cycle-E-super magic decomposition of complete bipartite graphs

An H-magic labeling in a H-decomposable graph G is a bijection f : V (G) ∪ E(G) → {1, 2, ..., p + q} such that for every copy H in the decomposition, ΣνεV(H) f(v) +  ΣeεE(H) f(e) is constant. f is said to be H-E-super magic if f(E(G)) = {1, 2, · · · , q}. A family of subgraphs H1,H2, · · · ,Hh of G is a mixed cycle-decomposition of G if every subgraph Hi is isomorphic to some cycle Ck, for k ≥ ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009